Machine learning-based predictive modeling of depression in hypertensive populations

We aimed to develop prediction models for depression among U.S. adults with hypertension using various machine learning (ML) approaches. Moreover, we analyzed the mechanisms of the developed models. This cross-sectional study included 8,628 adults with hypertension (11.3% with depression) from the National Health and Nutrition Examination Survey (2011–2020). We selected several significant features using feature selection methods to build the models. Data imbalance was managed with random down-sampling. Six different ML classification methods implemented in the R package caret—artificial neural network, random forest, AdaBoost, stochastic gradient boosting, XGBoost, and support vector machine—were employed with 10-fold cross-validation for predictions. Model performance was assessed by examining the area under the receiver operating characteristic curve (AUC), accuracy, precision, sensitivity, specificity, and F1-score. For an interpretable algorithm, we used the variable importance evaluation function in caret. Of all classification models, artificial neural network trained with selected features (n = 30) achieved the highest AUC (0.813) and specificity (0.780) in predicting depression. Support vector machine predicted depression with the highest accuracy (0.771), precision (0.969), sensitivity (0.774), and F1-score (0.860). The most frequent and important features contributing to the models included the ratio of family income to poverty, triglyceride level, white blood cell count, age, sleep disorder status, the presence of arthritis, hemoglobin level, marital status, and education level. In conclusion, ML algorithms performed comparably in predicting depression among hypertensive populations. Furthermore, the developed models shed light on variables’ relative importance, paving the way for further clinical research.


Introduction
Depression is a frequent comorbidity among individuals with hypertension. A meta-analysis including 41 studies demonstrated that 26.8% of patients with hypertension had depression [1]. Notably, depression is associated with inadequate blood pressure control and hypertension complications [2]. It also negatively affects patients' adherence to treatments, health behavior, and quality of life, all of which may produce poorer long-term outcomes [3]. Given the significant burden depression poses on individuals with hypertension, its early prediction in this group is critical.
Today, machine learning (ML) has been helpful for researchers in designing optimal predictive models within and across large datasets. In the United States, ML has been applied to train a classification model that could accurately identify depression based on several demographic, social, and clinical factors, either in the general population [4,5] or in people with chronic condition such as diabetes [4] or heart disease [6]. However, such efforts to identify depression in individuals with hypertension are lacking.
Accordingly, it is necessary to develop available ML models to screen for depression, facilitating early intervention in hypertensive populations. Model performance depends on both the research question and the type of data available. After appropriate data collection and processing (i.e., healthy data) and database building, evaluating models that are most suitable for the problem statement is crucial [7]. Several ML models crucial to clinical diagnosis of depression have been developed till date. Some techniques, such as artificial neural networks (ANNs), utilize deep learning algorithms and have successfully been applied in precision psychiatry [8]. In practice, ANN is a powerful non-linear statistical tool that can model complex associations between variables to best predict an outcome based on large-scale empirical data [9]. Other scholars have used conventional ML models, including ensemble methods such as random forests, AdaBoost, stochastic gradient boosting, and XGBoost, as well as advanced kernel-based techniques such as the support vector machine (SVM). All those models are useful in predicting psychiatric illness among patients with varying symptom severity, etiology, and clinical status [10].
In parallel, determining important factors contributing to depression is critical for early identification of individuals at risk of depression. In psychiatry, deep-learning algorithms are used to predict depression based on multimodal approaches such as using video, audio, and text streams. However, deep learning models based on textual and numeric data on clinical history and questionnaires tend to be non-explainable. Given that many ML methods, including ANN, are black boxes, they are limited in providing meaningful interpretations. Fortunately, nowadays several statistical packages offer an approach for researchers to interpret the models with visual presentations and clear interpretations of the analysis results [11].

Aim of the study
The aim of this study was to develop ML-based predictive models for depression in individuals with hypertension. Particularly, we implemented the classification process using six different ML algorithms: ANN, random forest, AdaBoost, stochastic gradient boosting, XGBoost, and SVM. In addition, the variable importance evaluation function with the caret package in R software [12] was used to interpret the operating mechanisms of the ANN model and that of the conventional ML-based classification models. Using this function, the predictors were ranked according to their relative contribution to the variable importance for each model.

Data source
We used the National Health and Nutrition Examination Survey (NHANES) datasets to train the model. NHANES is a periodic cross-sectional survey conducted by the Centers for Disease Control and Prevention to monitor trends in health and nutritional status in the non-institutionalized, community-dwelling US population. This survey has a complex multistage design to increase its representativeness. Approximately 5,000 individuals participate in the NHANES each year, and the data are reported in two-year cycles. Our study analyzed NHANES data from 2011-2020.

Population
The study population included a national sample of adults (� 40 years) with hypertension. Consistent with previous research [13], hypertension was defined as meeting one of the following criteria: (a) ever been told to have high blood pressure; (b) ever been told to take prescription for hypertension; (c) now taking prescribed medicine for hypertension; or (d) having average systolic blood pressure greater than or equal to 140 mmHg or diastolic blood pressure greater than or equal to 90 mmHg in the NHANES examination section. During the NHANES 2011-2020 survey, 8,938 adults (aged � 40 years) with hypertension were identified. Of those, 310 participants (3.5%) did not reply to the depression screening questionnaires (i.e., Patient Health Questionnaire-9 [PHQ-9]) and were hence excluded. This resulted in 8,628 participants eligible for inclusion.

Ethical review
Ethical approval was not required as NHANES is a publicly available dataset that removed personal identifiers. In addition, the University of Washington institutional review board deemed this study as exempt.

Measures
Inputs for predictive modeling. We selected factors potentially predicting depression based on data availability and previous studies' findings [5,14]. These included sociodemographic, behavioral, and clinical factors, as well as anthropometrics and biomarkers. In the NHANES, sociodemographic, behavioral, and clinical data are collected via home interviewadministered questionnaires, while trained staff collect anthropometrics and biomarkers using mobile exam units.
Sociodemographic factors include age, race/ethnicity, gender, marital status, education level, the ratio of family income to poverty, insurance status, and time spent uninsured in the past year. Behavioral factors include smoking status, minutes of sedentary activity, vigorous work activity, moderate work activity, walking or cycling, vigorous recreational activity, and moderate recreational activity. Clinical factors include the presence of arthritis, kidney disease, asthma, liver disease, cancer or a malignance of any kind, cardiovascular disease, and sleep disorders. Participants were considered as prevalent cardiovascular disease cases if they had ever been told by a doctor that they had any of the following conditions: congestive heart failure, coronary heart disease, angina/angina pectoris, heart attack, or stroke. Sleep disorder was assessed with the question "Have you ever told a doctor or other health professional that you have trouble sleeping?" Anthropometric and biomarkers include segmented neutrophils number, white blood cell count, red cell distribution width, mean cell volume, platelet count, gamma glutamyl transferase, alanine aminotransferase, alkaline phosphatase, eosinophils number, basophils number, glycohemoglobin, triglycerides, total cholesterol, body mass index, direct high-density lipoprotein cholesterol, sodium, total bilirubin, hemoglobin, hematocrit, albumin, monocyte number, lymphocyte number, potassium, uric acid, and creatinine.
Outputs for predictive modeling: Depression. For the diagnosis of depression, we used the PHQ-9 [15]. The PHQ-9 consists of 9 items based on the diagnostic criteria for depression from the Diagnostic and Statistical Manual of Mental Disorders IV. Each item is rated on a 3-point scale basis the frequency of depressive symptoms (0 = "not at all" to 3 = "nearly every day"). Scores ranged from 0 to 27, with higher scores indicating a higher severity of depression.
We selected 10 as our threshold for the diagnosis of depression, as this is a reliable threshold with acceptable sensitivity and specificity for detecting major depressive disorders [16]. During 2011-2020, 8,628 participants were assessed for depression and 976 received the diagnosis of depression (11.3%).

Data analysis steps
All analyses were conducted in R version 4.1.1 and its packages. Most of the input variables used in the current study have a missing data rate of < 7.0%. Missing values were replaced by the mean for continuous data and by the mode for categorical data by using the "na.roughfix ()" function in the randomForest package. Descriptive and bivariate analyses assessed baseline characteristics depending on participants' depression status. Further, the ML findings were structured using the following steps: (1) feature selection, (2) data pre-processing and partitioning, (3) managing data imbalance, (4) ML analysis for predictive classification modeling, and (5) ranking variable importance.
Feature selection. Given that not all features carry significant information, feature selection to discard redundant features that can potentially deteriorate the model performance was conducted. We evaluated 3 data-driven feature selection methods-including (1) 2 ML algorithms (Boruta and the least absolute shrinkage and selection operator [LASSO]) and (2) stepwise backward elimination-and selected whichever one could produce the best informative feature sets for our final prediction.
Boruta is a wrapper algorithm built around random forests that finds all relevant attributes by comparing the importance of the randomized copies of the attributes with that of the original attributes [17]. LASSO is a regression method, which performs variable selection and regularization using L1 penalty to shrink regression coefficients of the redundant features to zero [18]. In the current study, the penalty parameter lambda (λ) was tuned using 10-fold cross-validation based on the minimum partial likelihood deviance. The features with nonzero coefficients in optimal λ were selected and used in the model. Backward elimination removes predictor variables insignificant to the model based on the Akaike information criterion value, until the ideal number of predictor variables is achieved [19]. We used the Boruta package for Boruta, glmnet package for LASSO, and MASS package for stepwise backward elimination.
Data pre-processing and partitioning. To transform the raw data into the appropriate format for the ML model building, the datasets with the finalized features were preprocessed using the scale and center transformation methods of the "preProcess()" function in the caret package. Furthermore, all the categorical variables were one-hot encoded and were encoded as 2-factor variables. After the data were pre-processed, they were randomly divided into 2 sets: training (80.0%) and testing (20.0%). The training dataset was used to "train" and finalize the optimal model, whereas the testing dataset was used to evaluate the performance of the final model.
Managing data imbalance. The dataset used had extreme class imbalance (depression prevalence of 11.3%). ML algorithms tend to be biased toward the majority class and always return higher accuracy, which can be misleading. We used the random down-sampling technique provided by the ROSE package [20] to handle this imbalance. Random down-sampling was chosen as it performed better than random oversampling or the synthetic minority oversampling technique (SMOTE) [21] in our datasets, despite its simplicity. After down-sampling, the sizes of the 2 classes in the training data were similar (745, non-depressed: 781, depressed each).
ML analysis for predictive classification modeling. We performed classification modeling to predict the binary class of depression using features returned by feature selection methods. The modeling function in the caret package was used for all predictions to ensure uniform execution: 'nnet' (ANN), 'rf' (random forest), 'adaboost'(AdaBoost), 'gbm'(stochastic gradient boosting), 'xgbtree' (XGBoost), and 'svmLinear' (SVM) (see Table 1 for further details on each specific method). To identify and decrease the error values during model fitting, determining the optimal hyperparameters for each of the ML algorithms is crucial. Hyperparameter tuning is streamlined and easy to use in caret [12]. By default, the caret package automatically tunes the hyperparameter values for each algorithm using the package's standard grid set of candidate models. We then applied these hyperparameters to the down-sampled training data to fit the model parameters. Model parameters then tested the data to evaluate model performance. Cross-validation was used to select the best set of parameters for the final prediction; all models were trained with 10-fold cross-validation with 3 replications.
Model performance was assessed by examining the area under the receiver operating characteristic curve (AUC), accuracy, precision, sensitivity, specificity, and F1-score. AUC is a widely used metric for binary classification problems and provides a representative summary of the performance of a classifier. Generally, AUC values of 0.8 to 0.9 are considered good and Table 1. ML algorithms used in the current study [24].

ANN
ANN is a group of interconnected artificial neurons that utilizes a mathematical model or computational model to process information. The generic structure of a basic ANN comprises a series of nodes arranged in 3 layers (input, hidden, and output layers). The input nodes and the output node of an ANN correspond to the predictor variables and outcome variable, respectively. The nodes in the hidden layer are intermediate unobserved values that allow the ANN to model complex nonlinear associations between the input nodes and the output node. The nodes in different layers are connected by weights.

Random forest
Random forest is a tree-based ensemble method that utilizes parallel decision trees built on subsets of the data to develop an optimal predictive model. Each tree in the random forest casts a vote based on its prediction, and the classification with the most votes becomes the overall model's prediction.

AdaBoost
AdaBoost is also an ensemble method like random forest. The core principle of AdaBoost is to fit a sequence of "weak learners" (i.e., models that are only slightly better than random guessing) to repeatedly modified data. All predictions are then combined through a weighted majority vote (or sum) to generate the final prediction. nIter = 100, method = Adaboost.M1

Stochastic gradient boosting
Stochastic gradient boosting is another ensemble technique. It iteratively builds several small decision trees, each based on a random subset of the data, with each additional tree emphasizing observations poorly modeled by the existing collection of trees. Ultimately, observations are assigned a class based on the most common classification among the trees.

XGBoost
XGBoost implements gradient boosting with decision trees as the underlying learners. Whereas random forest employs individual trees in parallel to solve the same problem, XGBoost builds individual trees sequentially. Each tree is trained to resolve the prediction error remaining following the prior tree and thereby improves prediction. This offers another approach to building more complex and accurate models with trees while controlling individual tree depth and complexity. above 0.9 are considered excellent [22]. A higher F1-score signifies less false-positives and less false-negatives, which implies correct identification of the classes [22]. Both AUC and F1-score are well-known metrics for classification performance evaluation over an imbalanced dataset [23]. Accuracy, precision, sensitivity, specificity, and F1-score were evaluated using a confusion matrix. S1 Table details the calculation methods for these diagnostic performance measures. Ranking variable importance. We used the 'varImp()' function of the caret package to determine the relative predictor importance for each model. Using this function, the predictors were ranked according to their relative contribution to the variable importance for each model.

Results
The results were "unweighted" as we could not accommodate a complex survey design into the analyses due to the current lack of ML methodologies for handling complex design features (e.g., sampling weights, strata, and primary sampling units). Hereafter, we refer to unweighted prevalence rates (or unweighted means) directly as prevalence rates (or means) and provide further discussion in the limitations section.

Comparison of baseline characteristics
Among the 8,628 adults in the sample with hypertension, 976 (11.3%) reported a clinical level of depression based on their PHQ-9 score. The depressed group was significantly younger than the nondepressed group. This group had a higher percentage of individuals with Mexican, other Hispanic, and "other" ancestry, whereas the nondepressed group had a higher percentage of non-Hispanic White, non-Hispanic Black, and non-Hispanic Asian individuals. Women comprised 63.8% of the depressed group. More than half of those in the nondepressed group were married or living with a partner and had a college degree or more education. The ratio of family income to poverty in the nondepressed group was significantly higher than that in the depressed group. Table 2 provides additional characteristics of participants subdivided by depression status.

Feature selection for modeling
Among the three different feature selection techniques, stepwise backward elimination showed the most substantial reduction in the number of features (from 47 to 30; see S2 Table and S1 Fig) and yielded the optimal predictive performance (see S3 Table). Thus, we primarily based our models on features selected from the stepwise backward elimination method (see Supporting information files for full feature selection results). Features selected by stepwise backward elimination included: age, race/ethnicity, gender, marital status, education level, the ratio of family income to poverty, time spent uninsured in the past year, smoking status, minutes of sedentary activity, vigorous work activity, vigorous recreational activity, moderate recreational activity, all clinical factors, white blood cell count, platelet count, alanine aminotransferase, glycohemoglobin, triglycerides, total cholesterol, sodium, hemoglobin, lymphocyte number, uric acid, and creatinine.

ML analysis for predictive modeling
In the current study, the ANN model trained with selected features (n = 30) was developed with 1 hidden layer, and the decay weight was set at 0.09 based on cross-validation as it yielded the highest test set accuracy. ANNs were also tested using the keras package with varying  depth, size of hidden layers, and regularization (dropout and L2 penalty); however, no combination of hyperparameters tested yielded a higher AUC than the caret implementation. Table 1 summarizes the hyperparameters used in other models. The six ML models' classification performance based on the features selected from stepwise backward elimination is illustrated in an ROC curve (Fig 1). Of all classification models, ANN achieved the highest AUC (0.813) and specificity (0.780) in predicting depression. SVM predicted depression with the highest accuracy (0.771), precision (0.969), sensitivity (0.774), and F1-score (0.860). All classifiers achieved better classification accuracy than a random model (the gray diagonal line indicating AUC = 0.500 in Fig 1). Table 3 further demonstrates other model's performance.

Important features ranked by ML algorithm
The selected features contributed differently to each model. We combined the top 20 strongest contributing features from the six models and ranked them based on their inclusion in the models. In total, these models returned 24 top-20 features, nine of which were within the top 20 in at least five models, based on their rankings in each model ( Table 4). The most frequent and important features include: the ratio of family income to poverty, triglyceride level, white blood cell count, age, sleep disorder status, the presence of arthritis, hemoglobin level, marital status, and education level.

Discussion
The ML models developed in this study showed comparable performance in predicting depression among U.S. adults with hypertension. ANN specifically achieved the highest performance in terms of AUC and specificity. Few studies that have evaluated ANN-based models for predicting psychiatric illnesses have consistently outperformed conventional ML methods and traditional regression models [5,25,26]. Our findings add to the evidence of ANN models' power as computational tools for early diagnosis of depression in individuals with chronic conditions. Nevertheless, although not directly comparable, our ANN model's predictive ability was comparably lower than that found in previous ML studies, in which AUC values ranged from 0.910 to 0.920 [5] or equal to 0.913 [27]. The variable of family annual income was computed as a ratio of family income to poverty guidelines using the federal poverty level guidelines, which were available at (https://aspe.hhs.gov/prior-hhs-poverty-guidelines-and-federal-registerreferences). The poverty index is a ratio measuring the household income to the poverty threshold after accounting for inflation and family size. b Participants were considered as prevalent cardiovascular disease cases if ever told by a doctor that they had any of the following conditions: congestive heart failure, coronary heart disease, angina/angina pectoris, heart attack, or stroke. Notably, SVM also exhibited strong predictive performance with respect to other diagnostic measures, including accuracy, precision, sensitivity, and F1-score. SVM has recently gained crucial importance as neural network approaches for predicting the diagnosis and prognosis of a range of psychiatric and neurological disorders, including Alzheimer's disease,

PLOS ONE
schizophrenia, and depression [28][29][30]. Of note, the SVM has a high predictive accuracy when using large biomedical datasets comprising a small number of records with a large number of variables (i.e., insensitivity to high-dimensional data) and is less affected by imbalanced datasets [23,31], making it suitable for our analysis. However, SVMs do not always show a high predictive accuracy; in several papers, RF-based models have been reported to perform equally well or better than other algorithms [32][33][34][35][36]. For instance, in the studies by Mousavian et al. [33] and de Souza Filho et al. [34], RF outperformed SVM in predicting depression. Similarly, RF had the best accuracy in predicting anxiety, depression, and stress in the study by Priya et al. [35]. The ratio of family income to poverty was the most important feature across all models. This result accords with findings of recent ML studies [4,37], which reported the ratio of family income to poverty (or family income itself) as the most crucial feature in predicting depression among community-dwelling adults. Kang and Kim [38] also have noted that the associations of hypertension with symptoms and diagnosis of depression differ by income level. In addition to income, factors such as age, marital status, education-all of which are "social determinants of mental health," per Carod-Artal [39]-were also among the most important features across the models. Age has consistently been identified as a critical factor in explaining the variability in depression prevalence rates [40]. Marital status is one of the most important social factors affecting various life outcomes, especially mental health [39]. Education strongly affects depression as it heightens cognitive ability, provides economic and social resources, and leads to positive health behaviors [41]. Based on our results, we recommend that, in addition to the usual variables, healthcare providers collect information regarding these social determinants at the earliest possible opportunity to prevent depression and to screen individuals with hypertension for depression risk.
We identified several important biomarkers across the model: triglycerides, white blood cell count, and hemoglobin. In Lin et al.'s study [4] using random forest, triglycerides were an essential variable in building a depression prediction model that included the general population and individuals with a high body mass index. Sharma and Verbeke [14] observed that triglycerides were an important biomarker for diagnosing and distinguishing depression cases from healthy cases using the XGBoost algorithm. Moreover, non-ML studies demonstrate that higher depression scores are associated with an enhanced inflammatory state, as evidenced by higher levels of hematological inflammatory markers including white blood cells, both in individuals free of disease [42] and those with stable heart disease [43]. Finally, among less known modifiable risk factors for depression, anemia has attracted increasing attention. Anemia is often associated with conditions (e.g., cancer, chronic renal failure, malnutrition, etc.) that usually precede depressed mood [44]. Symptoms of low hemoglobin levels (e.g., paleness, fatigue, dizziness, shortness of breath during physical activity, etc.) also frequently occur alongside depressive symptoms [45]. Some have proposed that anemia has a pathophysiological role in depression due to chronic hypo-oxygenation [46,47], which further supports the importance of including hemoglobin levels in our model. Sleep disorder status was also important in building the predictive model. Disturbed sleep is associated with metabolic, neuroendocrine, and inflammatory changes, resulting in alterations to mental functioning [48]. Ma and Li [49] have also reported a significant correlation between sleep quality and depression in older patients with hypertension. Of note, assessment of sleep disorders in individuals with hypertension pertain is important not only for preventing comorbid debilitating mental health disorders but also for mitigating their adverse influence on hypertension management. According to one hypotheses, sleep alterations may impair adaptation to stress through allostasis and contribute to allostatic load, thereby compromising stress resiliency and amplifying blood pressure [50]. Faraut et al. [51] found that participants with short and long-sleep durations were more likely to have depressive symptoms, higher social vulnerability, and higher hypertension rates. One limitation in our interpretation is that the type of sleep disorder was not reported in the NHANES survey. Therefore, we could not explore the specific association between different sleep disorders and depression in the population with hypertension, which should be addressed in future studies. Such information may have helped derive more precise insights in preventing depression in our target population.
Lastly, arthritis was among our models' most important features. It has long been recognized that arthritis and depression are associated [52]. Individuals with arthritis fear long-term pain, loss of function, work disability, and possible socioeconomic effects of the disease [53]. With these rational fears and physical challenges, most patients with arthritis exhibit clinically significant levels of low self-esteem and self-stigma, which explains the high prevalence of depressive disorders among these patients [54,55]. Of importance, depression and arthritis increase the burden on the healthcare system, with increased provider visits, more pain complaints, and increased requests for pain medication that complicates hypertension management [56]. Therefore, greater primary preventive effort for depression should be directed toward individuals with hypertension and arthritis; for instance, administering routine depression screening.

Limitations and implications
Our study has several limitations. First, although our ANN and SVM models evidently provided the best performance on the test set with the highest AUC and F1-score, the clinical utility of the models remains speculative at this stage. This is mainly because our results were based on self-reported data: the diagnosis of depression was based on a self-reported questionnaire in the present study without validation using actual clinical records or direct patient examination. In addition, our sample size was relatively small compared to other population studies. By building a more extensive database for training a prediction model, the variations observed among adults with depression can be more thoroughly incorporated. In the future, this may result in models with true clinical utility. Second, associations between inputs and outputs for predictive modeling do not infer causal relationships as the current study used cross-sectional data; for instance, the relationship between arthritis and depression may have been bidirectional. Another limitation is the inability of most ML algorithms to account for complex survey designs with multi-stage stratified sampling, which is often used for household surveys like the NHANES. Therefore, our sample should not be considered a true representative of community-dwelling adults with depression between 2011-2020.
Finally, recognizing the limitations of prevalence analyses is important. For instance, in the current study, the estimates of hypertension prevalence were drawn from many sources including survey data. However, an isolated survey response does not guarantee the diagnosis of the disease. In addition, antihypertensive medications can be prescribed to patients without hypertension; for example, the use of angiotensin-converting enzyme inhibitors for diabetic patients with chronic kidney disease. Accordingly, some samples included in the study may not have been representative of the population we targeted. Furthermore, the results should be interpreted cautiously since prevalence data alone cannot completely explain the disease dynamics. For example, in the case of a sleep disorder, a participant could have returned to sleep normalcy after medical treatment. Therefore, the question about the prevalence of sleep disorder does not assess whether the patient continued to have a sleeping disorder. Despite its limitations, this study is the first to predict depression among hypertensive populations using multiple ML approaches. This study also presents a potential method to aid the preliminary screening of depression among patients with hypertension, before a formal clinical diagnosis.
Several implications should be considered. First, our use of cross-sectional data to evaluate the ML models may have introduced bias in performance estimation, as the ML models' performance should ideally be evaluated on newly collected data or a separate dataset for reliability. Further studies should address this limitation. Second, apart from including traditional risk factors, including different types of inputs could help further improve depression prediction [34]. For example, we did not include quality of life variables, such as familial relationships, social relationships, or leisure activity, which can help in better predicting depression prediction [37], owing to the fundamental limitations of the original NHANES survey. Third, the models developed in this study determined the variables' predictive importance, facilitating additional clinical research; for instance, the strongest features across the models could be used to further improve depression prediction in future studies. Finally, a larger volume of data from the healthy population would be preferrable. With larger datasets, the methods employed will begin to vary and demonstrate improved validity [10]: particularly, the feature selection methods will improve performance, as they are likely to be affected by sample size; in addition, the k-fold cross-validation method can be utilized with larger k-values instead of the leave-one-out method to allow for larger sets on which to test prediction models and improve models' generalizability.

Conclusion
In the current study, ML algorithms performed comparably in predicting depression among U.S. adults with hypertension. Models with superior performance may aid in developing screening tools for depression among hypertensive adults in future studies. Furthermore, the risk factors for depression identified across the models may inform healthcare professionals to devise effective prevention strategies by focusing on at-risk individuals and may assist patients with hypertension with decisions regarding the use of diagnostic testing, treatments, or lifestyle changes.